Evaluation of Handwriting Recognition Systems for Application to Historical Records

نویسندگان

  • Patrick Schone
  • Heath Nielson
  • Mark Ward
چکیده

In the last decade, significant, largely-governmental funding has been applied to the automatic transcription of handwritten documents. Uses for this kind of technology are somewhat limited given that the numbers of handwritten documents are on the decline. However, certain types of handwritten historical records can be crucial for genealogical research in that they identify key vital facts. In recent years, organizations like FamilySearch have exhausted huge efforts to identify, digitize, and transcribe these kinds of genealogically-rich records. Until now, such transcription has largely been done through massive crowd-sourced labor. We believe handwriting recognition technology is only a few years away from profitable application to genealogical documents. To test this hypothesis, we developed an evaluation paradigm for measuring handwriting recognition performance on four data collections of differing genres and languages. We invited research organizations to participate in the evaluation and compared performance to the outcome of human annotation. In this paper, we provide the details of this paradigm, including the guidelines, corpora and evaluation tools. Then we illustrate the exciting system results which suggest that the state-of-the-art is very close to providing real-world benefit to the automatic transcription of genealogically-rich documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus and Evaluation of Handwriting Recognition of Historical Genealogical Records

Over the last few decades, significant strides have been made in handwriting recognition (HR), which is the automatic transcription of handwritten documents. HR often focuses on modern handwritten material, but in the electronic age, the volume of handwritten material is rapidly declining. However, we believe HR is on the verge of having major application to historical record collections. In re...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

An Empirical Evaluation of Off-line Arabic Handwriting And Printed Characters Recognition System

Handwriting recognition is a challenging task for many real-world applications such as document authentication, form processing, historical documents. This paper focuses on the comparative study on off-line handwriting recognition system and Printed Characters by taking Arabic handwriting. The off-line Handwriting Recognition methods for Arabic words which being often used among then across the...

متن کامل

Fusion of Segmentation Strategies for Off-Line Cursive Handwriting Recognition

Cursive handwriting recognition is a challenging task for many real-world applications such as document authentication, form processing, postal address recognition, reading machines for the blind, bank check recognition, and interpretation of historical documents. Therefore, in the last few decades, researchers have put an enormous effort into developing various techniques for handwriting recog...

متن کامل

Flexible Computer Assisted Transcription of Historical Documents Through Subword Spotting

In the absence of accurate handwriting recognition for historical documents, computer assisted transcription (CAT) methods move into the spotlight. We explore some of the weaknesses of current CAT systems and propose a CAT system which relies on subword spotting that overcomes most of these. The system is ideal crowdsourcing transcription to mobile users.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013